[00:00.000 --> 00:05.140]  Hi, everyone. I'm Alex Sierra. I'm going to be talking about possible SaaSpocalypse,
[00:05.140 --> 00:10.720]  the complexity and power of AWS cross-account access. Just a little very quick tidbit about
[00:10.720 --> 00:17.160]  myself. I am the founder of Tenshi Security, as was just described. I'm based out of Sao Paulo,
[00:17.160 --> 00:24.200]  Brazil, and I'm currently focusing exclusively on cloud security as my day job on the new company.
[00:24.200 --> 00:30.800]  This is where my interest in this topic came along because of engagements with customers
[00:31.340 --> 00:36.300]  that had a lot of SaaS providers that were representing an interesting level of risk
[00:36.300 --> 00:45.720]  to them. So let's get started. Why should you care? Why should you be interested in learning
[00:45.720 --> 00:52.180]  more about cross-account access in AWS? The thing is, as the ATT&CK framework mentions,
[00:52.180 --> 00:57.960]  abusing a relationship that already exists between a vendor and a customer, or what's also
[00:57.960 --> 01:03.480]  known as supply chain attacks, can be a very effective way for an attacker to gain a foothold
[01:03.480 --> 01:08.940]  into an organization. There are very well-known APT groups that have used that tactic in the past.
[01:08.960 --> 01:15.180]  There's very well-known incidents that were brought about by this sort of technique.
[01:15.320 --> 01:20.420]  And especially when you talk about the cloud, a lot of companies are already either all in
[01:20.420 --> 01:25.540]  the cloud, startups, new projects, even at large companies, and existing infrastructures being
[01:25.540 --> 01:33.920]  migrated to the cloud at a frantic pace. And it becomes a lot easier to grant access to your cloud
[01:33.920 --> 01:40.160]  accounts to all sorts of SaaS providers, CSPM solutions or cloud security partial management
[01:40.160 --> 01:46.700]  solutions, a single pane of glass, multi-cloud goodness for managing all the things,
[01:46.700 --> 01:53.740]  identity providers, backup solutions, IT management, all sorts of products are taking
[01:53.740 --> 02:01.300]  advantage of the fact that using the cloud service provider APIs to gain access to a customer's
[02:01.300 --> 02:08.680]  account allows you to do a lot more automation and allows you to scale a lot better those services
[02:08.680 --> 02:13.820]  that would have been a lot more complex to integrate with on-premises environments.
[02:13.820 --> 02:18.660]  And so the tendency is that those SaaS providers are going to be connected to a lot of customers,
[02:18.660 --> 02:22.840]  gaining access to a lot of accounts, and that's a huge concentrator of risk.
[02:22.940 --> 02:29.460]  If you have a single provider that has access to thousands of different AWS accounts around the
[02:29.460 --> 02:37.120]  world with a decent amount of privileges, that'll be a jackpot for any attacker. So that's a huge
[02:37.120 --> 02:43.800]  risk concentrator, which is why we need to get granting those accesses right as defenders.
[02:44.380 --> 02:52.620]  And so I'm going to start with a very quick, you know, a thousand feet high overview of AWS IAM,
[02:52.620 --> 02:59.080]  just for people that are not very familiar with the concepts. But AWS has a few very interesting
[02:59.820 --> 03:06.260]  concepts that we need to cover. First of all, on AWS, AWS accounts are self-contained.
[03:06.260 --> 03:11.020]  The identities exist within the account, the privileges that they have exist within the
[03:11.020 --> 03:16.540]  account, and the resources they access, the APIs, the actions they execute, all live within an
[03:16.540 --> 03:24.560]  account. It's not possible directly for one, say, IAM user to action something on another account,
[03:24.560 --> 03:31.420]  right? Everything that happens typically happens inside a single account. It's very different from,
[03:31.420 --> 03:37.320]  for example, Azure, where you have an Active Directory user and it has access to several
[03:37.320 --> 03:43.560]  subscriptions, which kind of would be the equivalence of AWS accounts. It's the same user,
[03:43.560 --> 03:49.700]  they're just being given privileges on different subscriptions. That's not what happens on AWS,
[03:49.700 --> 03:56.260]  right? So there are some resources on AWS that you can share across accounts explicitly,
[03:57.320 --> 04:03.080]  most infamously S3 buckets, which people, you know, since they can share it with the world,
[04:03.080 --> 04:07.880]  tend to do when they shouldn't. But other things like EBS snapshots,
[04:07.880 --> 04:16.440]  AMIs, and if you say AMIs, I hate you, but AMIs are other kinds of resources that you can
[04:17.420 --> 04:23.860]  share either publicly with the whole world or with specific AWS accounts. But more generally,
[04:23.860 --> 04:28.440]  if you are doing some sort of work, if you are a SaaS provider and need to have some sort of access
[04:28.440 --> 04:35.160]  into a customer AWS account, you need to have some sort of identity inside that account to be
[04:35.160 --> 04:39.960]  able to be given privileges and access those actions and resources inside the customer account.
[04:40.580 --> 04:48.120]  And so the way that this has been done typically is either you create something like an IAM user,
[04:48.120 --> 04:54.200]  which is one sort of identity that has long lasting credentials. So if you have,
[04:54.200 --> 04:58.460]  you can have a password to access a console. So for managed services provided,
[04:58.460 --> 05:02.760]  it's typically what you probably get. Or if you're doing automated access, which is more
[05:02.760 --> 05:08.600]  typical in the SaaS providers, you would have an access key ID and a secret access key, right?
[05:08.940 --> 05:15.300]  But AWS also provides another way of doing this, which is using the STS or security token service,
[05:15.300 --> 05:21.140]  where you create a role, which is a sort of a user that can't access the console, say.
[05:21.780 --> 05:29.360]  And the way this works is you assign privileges to a role, you say who can assume that role or
[05:29.360 --> 05:35.260]  become that role. And then the security token service will provide temporary credentials.
[05:35.260 --> 05:38.840]  And then there will be an extra thing. So it will be the access key ID and the secret access key
[05:38.840 --> 05:48.900]  and the session token that you can use to call APIs, AWS APIs, limited by those privileges,
[05:48.900 --> 05:52.300]  specific period of time. So there are temporary credentials.
[05:53.700 --> 05:59.200]  And so what you need to do to be able to do this is you have to configure the trust policy
[05:59.200 --> 06:03.260]  on the role, which says who can assume that role. You assign it the proper permissions on
[06:03.260 --> 06:07.220]  the permission policy. When you are the administrator and you are creating the
[06:07.220 --> 06:12.040]  role, it can set a maximum session duration. But that's not enough. The identity that's
[06:12.040 --> 06:17.980]  going to assume that role also needs to have privileges to call the STS assume role action,
[06:17.980 --> 06:27.660]  which is very aptly named. And so this is what option one of how a SaaS provider can access
[06:28.300 --> 06:34.920]  AWS accounts for customers looks like. And this looks like what you would do on Azure as well,
[06:34.920 --> 06:42.840]  and what has been done over time for a variety of solutions. Essentially, you create an IAM user,
[06:42.840 --> 06:49.420]  you assign it the necessary privileges with identity policies, you create long lived access
[06:49.420 --> 06:54.440]  keys, and then the customer somehow uses a trusted channel, don't you love those,
[06:54.440 --> 07:00.920]  to send those credentials to the MSP. It can be just going on a web portal to do the self
[07:01.680 --> 07:08.380]  onboarding, right? And then what happens is inside the vendor, that's where the trick is,
[07:08.380 --> 07:14.820]  any machine, any lambda, any container, or any person that needs to perform actions on that
[07:14.820 --> 07:19.540]  customer's account, will need to be able to retrieve those credentials. So those credentials
[07:19.540 --> 07:25.800]  need to be stored long term. If they are encrypted, they need to use reversible encryption,
[07:25.800 --> 07:30.920]  because every time that vendor needs to call API's on behalf of that customer to access
[07:30.920 --> 07:37.820]  their AWS account, those credentials need to be there. And that's kind of the problem,
[07:37.820 --> 07:44.420]  right? Credentials are secret, right? If they aren't leaked, that's basically everything that
[07:44.420 --> 07:49.920]  an attacker needs to be able to access that customer's account. So what happens is there's
[07:50.080 --> 07:56.480]  a lot of burden associated with maintaining secrets. You need to rotate them regularly,
[07:56.480 --> 08:00.360]  right? The customer will probably mandate that. So that's a lot of work for the customer,
[08:00.360 --> 08:04.880]  that's a lot of work for the vendor, the SaaS vendor, especially if a lot of custom
[08:04.880 --> 08:12.300]  accounts are involved, right? They're very easy to lose, right? I mean, all sorts of ways.
[08:12.300 --> 08:17.980]  And since every time they need to be used, they will be accessed, that's an opportunity to lose
[08:17.980 --> 08:23.660]  them, to leak them, right? So someone can break into the worker machines. If there are human
[08:23.660 --> 08:30.800]  beings involved, their machines can be hacked. Backups can be accidentally published somewhere.
[08:30.800 --> 08:36.460]  All sorts of things can happen where those secrets leak. And if that happens, if the
[08:36.460 --> 08:43.360]  secrets that a SaaS vendor uses to access 1000 AWS accounts from customers are lost,
[08:43.360 --> 08:50.360]  the only way to get back to a secure state is to ask all of those customers
[08:51.040 --> 08:56.360]  to rotate those credentials, to invalidate the old credentials, generate new ones and
[08:56.360 --> 09:00.540]  reconfigure them on the MSP. So that generates a lot of work, a lot of downtime, and it has to
[09:00.540 --> 09:06.300]  be done in a hurry to minimize the exposure window, right? And so if one of your customers
[09:06.300 --> 09:12.700]  didn't get the email, is on vacation, whatever, that environment will continue to be exposed,
[09:12.700 --> 09:18.920]  and there's nothing that the SaaS vendor can do unless plead with their customers to rotate the
[09:18.920 --> 09:25.600]  secrets and be exposed to the liability of any losses the customers incur because of their
[09:25.600 --> 09:33.060]  breach, because of their leaking of those secrets. So yeah, that's not ideal, right?
[09:34.040 --> 09:39.660]  Option B looks more interesting. That's where we use roles to do cross-account access.
[09:39.660 --> 09:45.200]  And the way this works is you create a role on the customer environment, and the trust policy says
[09:45.840 --> 09:52.440]  the vendor AWS account can assume that role. And it's important to keep in mind this is being
[09:52.440 --> 09:58.800]  assigned to the entire account. It's up to the vendor to control who inside their account
[10:00.100 --> 10:05.080]  gets the STS assume role privileges to assume the role. As far as the customer is concerned,
[10:05.080 --> 10:10.600]  they're trusting the entire SaaS vendor account. They have to trust an account ID, right? So they
[10:10.600 --> 10:15.460]  create a role, they set up a trust policy saying, I trust this vendor's account, and then they
[10:15.460 --> 10:21.380]  assign the privileges to that role using the permissions policy that the vendor needs to
[10:21.380 --> 10:27.560]  perform their work. And now on the vendor side, what they do is they need to control essentially
[10:27.560 --> 10:35.000]  who gets to call STS assume role on the customer's role, right? That's it. Whenever a machine,
[10:35.000 --> 10:40.140]  an instance, a container, Lambda, or a person needs to work on that customer's account,
[10:40.140 --> 10:46.780]  they need to know what the unique identifier of that role is, or the ARN in Amazon parlance,
[10:46.780 --> 10:54.320]  the ARN. And so they assume that role, they get temporary credentials, they don't store it
[10:54.320 --> 10:59.240]  anywhere, those are ephemeral credentials, they just keep it in memory, use it for the few API
[10:59.240 --> 11:04.740]  calls they need to do, and then they throw them away. And they repeat that process each time,
[11:04.740 --> 11:11.420]  right? And so a lot of risk is already being minimized here, because those credentials are
[11:11.420 --> 11:18.320]  temporary. Even if they do leak, they will last a short amount of time. And if you have thousands
[11:18.320 --> 11:22.920]  of accounts, but you're only accessing a few at a time, even if an attacker compromises all of
[11:22.920 --> 11:30.960]  your worker instances, they probably can read secrets in memory, say. If they can't also have
[11:30.960 --> 11:35.240]  the ability to call assume role, they will only get secrets for a few customers, and they will
[11:35.790 --> 11:40.660]  not last long. So there's a lot of mitigation already taking place. But the most important one
[11:40.660 --> 11:46.000]  is, this is the only state that the vendor needs to keep. They only need to know what
[11:46.000 --> 11:52.680]  ARN they need to assume, the ARN of the role that they need to assume. This is not a secret.
[11:52.680 --> 11:58.360]  Even if this leaks, if the database containing all the customer ARNs that they can
[11:58.360 --> 12:04.280]  connect to is leaked, there's nothing an attacker can do with it, because they still need to break
[12:04.280 --> 12:11.320]  into the SaaS vendor account and call STS assume role from that account. Because this is what's
[12:11.320 --> 12:17.600]  happening. AWS, because any identities on the customer account are being authenticated against
[12:17.600 --> 12:23.640]  AWS. Any identities inside the vendor account are being authenticated against AWS. And then AWS
[12:23.640 --> 12:31.220]  acts as a middleman to say, if a request comes to assume that role, AWS is assuring that this
[12:31.220 --> 12:37.840]  is coming from a principle, from an identity that really belongs to the SaaS vendor account
[12:37.840 --> 12:43.940]  and has been properly authenticated. You don't need another secret. AWS is kind of making the
[12:43.940 --> 12:50.960]  trust transitive in that sense. And so an attacker that just has the state the vendor keeps here,
[12:50.960 --> 12:57.760]  still can't do anything and just hack into the vendor. But it's not perfect. This is still naive.
[12:57.760 --> 13:03.580]  as I described, because there's a problem. We have what's called the confused deputy problem.
[13:03.680 --> 13:12.360]  So here's what happens. Imagine you go into G Suite today. And you say, look, I own,
[13:12.360 --> 13:19.040]  I don't know, Capital One.com, right, or Google.com. And I want to create a G Suite domain,
[13:19.040 --> 13:23.000]  you know, and create email addresses and send emails on behalf of that domain.
[13:23.500 --> 13:29.220]  What they will tell you is, first of all, you need to prove to me that you own that domain.
[13:29.220 --> 13:36.080]  Otherwise, you're just someone that's trying to do something evil here and impersonate people
[13:36.080 --> 13:41.680]  from that place. You need to prove first, somehow that you own that DNS domain. Same if you're
[13:41.680 --> 13:51.080]  issuing certificates on Let's Encrypt, they'll ask you to post like a public TXT record saying,
[13:51.080 --> 13:56.560]  proving that you are the admin of that domain. Otherwise, people could just issue certificates
[13:56.560 --> 14:01.300]  on behalf of domains they don't own and send up phishing sites and things like that.
[14:01.540 --> 14:05.720]  This is the same problem that's going to happen here. Someone could log into one of the SaaS tools
[14:05.720 --> 14:12.700]  and find out a target's account ID and pretend to own that account and say, look,
[14:13.230 --> 14:16.960]  I own this account ID. Can you please do a vulnerability scan? Or can you do a backup
[14:16.960 --> 14:23.440]  of the data there? And then if that account was set up previously by the customer,
[14:23.440 --> 14:29.720]  by the actual owner to work with that same SaaS vendor, the account does trust the SaaS vendor
[14:29.720 --> 14:35.720]  account. So it will work. The SaaS vendor will, let's say, scan, if it's a CSPM solution,
[14:35.720 --> 14:40.060]  it's going to scan that account twice, once on behalf of the actual owner and another time
[14:40.060 --> 14:44.980]  on behalf of the attacker. And so the attacker will have access to whatever information
[14:44.980 --> 14:50.940]  that SaaS vendor or functionality that SaaS vendor provides to an account they don't own.
[14:50.940 --> 14:59.400]  So that's a serious problem. And so what AWS did to solve this is to create something similar
[14:59.400 --> 15:06.060]  to what Let's Encrypt does or G Suite does, which is we're going to issue a challenge
[15:06.060 --> 15:14.200]  that proves that you are an admin of that target account. And actually, that you are an admin of
[15:14.200 --> 15:21.640]  that role that you're asking me to assume, to let you assume. And so they created what's called
[15:21.640 --> 15:28.480]  an external ID. This is a non-secret value, which is simply a challenge. The same way that
[15:28.480 --> 15:34.740]  the value that you're adding to a TXT record, to prove to Let's Encrypt or G Suite,
[15:34.740 --> 15:44.700]  that you are the admin of that DNS domain. This is simply a nonce, a token that the SaaS vendor
[15:44.700 --> 15:51.860]  will choose and say to the customer or to the prospective customer saying, if you really own
[15:51.860 --> 15:58.280]  this account, you're going to configure the trust policy on the role to only allow me in
[15:58.840 --> 16:06.100]  if this exact nonce is used, this exact external ID is used. And so on the left here, you see what
[16:06.100 --> 16:13.020]  that trust policy would look like. So you would have the STS assume role allow, you would have
[16:13.020 --> 16:17.840]  on the customer side, you would configure that on the role, you would put the ID of the
[16:17.840 --> 16:24.920]  SaaS vendor AWS account, and then you would add a condition, which is something that's
[16:24.920 --> 16:33.000]  standard on IAM policy syntax, saying that STS external ID needs to match exactly the
[16:33.560 --> 16:39.000]  value that was created by the SaaS vendor. And so what happens on the other side is that the
[16:39.000 --> 16:45.840]  vendor, when they do call assume role, besides telling AWS which role they want to assume,
[16:45.840 --> 16:53.020]  they will also pass an optional external ID. And so if the two things match, then the activity will
[16:53.020 --> 16:58.640]  be successful, the role, the temporary credential will be returned, and everything will be fine.
[16:58.940 --> 17:04.120]  Right? And so in a nutshell, this is what this looks like. The customer creates the role as
[17:04.120 --> 17:10.300]  before, but they add this extra condition to make sure that they're using an external ID
[17:10.860 --> 17:15.760]  that the SaaS vendor has created, right? And they again, assign the correct privileges.
[17:15.840 --> 17:22.800]  The SaaS vendor now has to do more work, though. What they have to do is to make sure that
[17:22.800 --> 17:28.920]  they are choosing the challenge, and that the challenge is unique, in the sense that they don't
[17:28.920 --> 17:36.680]  repeat or it's very unlikely that they will repeat that challenge, right? To make sure that
[17:38.040 --> 17:43.940]  anyone that can change the trust policy on the role they're assuming can be assumed to be an
[17:43.940 --> 17:49.880]  admin of that account and of that role, right? Once that happens, everything is the same.
[17:49.880 --> 17:54.500]  The thing that changed is now there's an additional piece of state as well that the
[17:54.500 --> 18:00.520]  vendors need to keep around, which is not only they have to know what the customer role arm is,
[18:00.520 --> 18:06.060]  but they also need to know what that challenge value is. But let's keep in mind, neither of
[18:06.060 --> 18:13.280]  those are secrets. Again, even if an attacker got a hold of all of that information, if a database
[18:13.280 --> 18:20.980]  dump was leaked, they still don't have any access into customer accounts unless they also were able
[18:20.980 --> 18:28.680]  to break into somehow the SaaS vendor account and can issue STS assume role calls from there,
[18:28.680 --> 18:37.540]  which is the key part, right? Where they need to be authenticated into the SaaS vendor account,
[18:37.540 --> 18:43.080]  right? So this is vastly superior to the scenario we had with the shared secret.
[18:43.280 --> 18:50.980]  Right? So a few caveats here that are important to keep in mind, right? The first one is
[18:51.740 --> 18:59.600]  you can't just check if the operation as a vendor, you need to confirm that in order to
[18:59.600 --> 19:04.780]  confirm that that account belongs to the person that's claiming that account, you can't just
[19:04.780 --> 19:11.560]  do an assume role passing the external ID and seeing if it succeeds. Because here's the problem.
[19:12.190 --> 19:19.640]  If the customer incorrectly sets the trust policy to not have that condition,
[19:19.640 --> 19:23.860]  it just says, I trust that account ID, but they don't add the condition block
[19:23.860 --> 19:29.780]  that requires the external ID, that operation will succeed. Here's the thing. The way that
[19:29.780 --> 19:38.540]  condition keys work on IAM policies is that they're like environment variables, right? You've
[19:38.540 --> 19:44.420]  added a value that can be used or not. But if it's not used, it's not going to have an effect,
[19:44.420 --> 19:52.160]  right? So basically, when the vendor calls STS assume role and adds the external ID on their end,
[19:52.160 --> 19:57.680]  unless the trust policy is checking for it, it won't have an impact. AWS is not treating this
[19:57.680 --> 20:04.380]  as a special value, that if it's not checked on the policy, that the evaluation is going to fail.
[20:04.380 --> 20:09.060]  So essentially, you need to check twice as a vendor, you need to check that you
[20:09.060 --> 20:15.600]  can do the assume role with the external ID. But you also need to check that you cannot that it
[20:15.600 --> 20:21.680]  fails if you do not provide the external ID. That's the only way that you can be sure that
[20:21.680 --> 20:28.020]  the customer really wrote the trust policy correctly with the condition there. So you need
[20:28.020 --> 20:33.480]  to check those two things. And that's very important. The other thing that's really important
[20:34.030 --> 20:41.820]  is what privileges the SaaS vendor will give to the worker. I'm using this general term to
[20:41.820 --> 20:46.600]  describe the instances, the lambdas, the containers, or the human beings that will
[20:46.600 --> 20:55.560]  be able to use those credentials to assume the roles and do API calls on the customer accounts.
[20:55.620 --> 21:01.840]  There are no conditions involved, I'm sorry. And so, since you do not know in advance
[21:02.280 --> 21:10.420]  what all of the account IDs of all the customers are, you might be tempted to do the first version
[21:10.420 --> 21:17.360]  here on the top that has Drake pretty sad. And so asterisks are bad, especially if they're by
[21:17.360 --> 21:22.720]  themselves, they get really angry when they're alone. And so having a resource asterisk there
[21:22.720 --> 21:28.580]  is a horrible idea, please do not do that. Basically, what this is doing is allowing that
[21:28.580 --> 21:36.620]  worker to have unrestricted STS assume role to any role on any account that has a trust policy,
[21:36.620 --> 21:43.180]  trust in your account, but also any role inside the SaaS vendor account as well.
[21:43.180 --> 21:48.140]  So if you have a role lying around that allows for administrative privileges,
[21:48.140 --> 21:53.870]  and that will be the case if your account is part of an AWS organization,
[21:56.270 --> 22:01.310]  then whoever gets those privileges, or if an attacker breaks into your worker,
[22:01.310 --> 22:06.970]  if that human being is a malicious insider of their laptop, their credentials are stolen or
[22:06.970 --> 22:12.170]  something like that, then an attacker can easily use this to escalate privileges and take over
[22:12.170 --> 22:17.790]  the vendor account. So that's a really bad idea. You want to do something that looks like
[22:17.790 --> 22:24.530]  the second version there, where you have standardized the role names on the customers.
[22:24.630 --> 22:31.010]  So that already mitigates a lot of risk. Because even if you just had the allow statement there,
[22:31.010 --> 22:38.930]  it would only allow a role with that name on the SaaS vendor account to be executed. But I would
[22:38.930 --> 22:45.810]  also add that deny as well, that then covers that possibility. So that second version is allowing you
[22:45.810 --> 22:51.770]  to do STS assume roles, only two roles, in that case called SaaS cross account role,
[22:51.770 --> 22:59.430]  you can choose whatever name you want, but on any account other than your own account.
[23:00.270 --> 23:07.050]  And so that would prevent you from overprivileging your workers and allowing an attacker or a
[23:07.050 --> 23:12.810]  malicious insider to do a privilege escalation. So that's really important to keep in mind.
[23:14.790 --> 23:21.890]  It would have been pretty nice if you were able to use SCPs to do something similar,
[23:21.890 --> 23:27.430]  to create an SCP that says the only identity within the SaaS vendor account that can do assume
[23:27.430 --> 23:33.690]  roles for other accounts is this lambda or this container task or whatever.
[23:33.930 --> 23:39.690]  But I haven't been able to do it. And the reason is this code here on the left would have worked
[23:39.810 --> 23:45.750]  a treat, I think, and probably a lot of people that know a lot more than I do will correct me
[23:45.750 --> 23:54.190]  soon on this. But here's the problem. For some weird reason that I have no idea what they are,
[23:54.190 --> 24:01.670]  SCPs do not support the not resource syntax. They only support resource. They support not action,
[24:01.670 --> 24:08.390]  but they do not support not resource. So I can't say I'll deny for anything that's not on my
[24:08.390 --> 24:19.330]  account. That can't be done on an SCP. So why? I mean, why? In any case, that means that you need
[24:19.330 --> 24:23.890]  to protect, as you should anyway, your root account privileges, for example, or anyone with
[24:23.890 --> 24:29.250]  the administrator privileges, because they will still be able to assume roles on all your customers
[24:29.250 --> 24:37.770]  as well if they have those privileges on the SaaS vendor account. CloudTrail is always your friend.
[24:37.770 --> 24:45.690]  You can see those STS assume role activities happening on the customer account as well.
[24:45.690 --> 24:51.170]  This will be logged there. So if you are a vendor or if you are a customer, you need to monitor
[24:51.170 --> 24:56.910]  those activities and make sure that you're not seeing something weird. So for example, any
[24:56.910 --> 25:04.610]  attempts that fail, that are trying to use a different external ID, or once you know
[25:05.250 --> 25:11.930]  what the identity, what the temporary credentials were assigned to a vendor are, if you look for
[25:11.930 --> 25:19.150]  what API calls they're doing with those temporary credentials later on, you can try to find
[25:19.150 --> 25:24.070]  patterns, especially for things that are blocked. If you have a vendor that asks you for
[25:24.070 --> 25:31.190]  the privileges they need, and they have automated tasks running, it shouldn't, I think, have a lot
[25:31.190 --> 25:35.430]  of blocked activities. They shouldn't be trying to do things that they didn't ask
[25:35.430 --> 25:39.970]  you the privileges to do. So there are lots of things that you can very easily monitor that should
[25:39.970 --> 25:46.450]  be low volume, high impact events that you can monitor using CloudTrail. So not only the STS
[25:46.450 --> 25:51.270]  assume role operations themselves, and you can see that the external ID is not a secret because it's
[25:51.270 --> 25:58.250]  explicitly included on the request parameters there. AWS does not consider that to be secret.
[25:59.190 --> 26:04.390]  And so monitoring the STS assume role operations themselves, maybe you find out about cross
[26:04.390 --> 26:09.930]  account access that you didn't know existed, because you see that on the log on your scene,
[26:09.930 --> 26:16.030]  or you keep track of the identities assigned to the vendor, and then you see what other
[26:16.030 --> 26:20.750]  activities they performed. And especially you pay attention to things that might indicate
[26:20.750 --> 26:29.390]  enumeration, like reconnaissance, and denied activities, which again, for automated workloads,
[26:29.390 --> 26:34.670]  probably shouldn't happen that often, right? And probably worth your time investigating.
[26:35.630 --> 26:42.330]  And so I want to highlight the research from Kasten Broughton, I hope I'm pronouncing that
[26:42.330 --> 26:50.930]  right, which is amazing. He looked at this, and he said, how many vendors are doing this right?
[26:50.930 --> 26:57.910]  And so he tested about 200 vendors, which is awesome. This is a lot of vendors. And to my
[26:57.910 --> 27:02.350]  surprise, he found that even though all of the advantages we just described to using the assume
[27:02.350 --> 27:10.430]  role method, about 50% of the vendors he tested chose to work exclusively with IAM user credentials,
[27:10.430 --> 27:16.710]  long-lived credentials, they chose to continue having to deal in secrets with all of the
[27:16.710 --> 27:23.230]  disadvantages that we mentioned, which is to me like a wasted opportunity to do better, to be
[27:23.230 --> 27:33.310]  quite frank. And of those 50% or 100 user vendors that did use the assume role way of getting access
[27:33.310 --> 27:43.410]  to customer accounts, 98% of them, so just two vendors, did this right. So 98% of them got this
[27:43.410 --> 27:51.510]  wrong. Actually did that second check to see if the assume role failed, if they didn't provide the
[27:51.510 --> 27:59.150]  external ID. So only two of them, two out of 100, actually checked if the customer configured the
[27:59.150 --> 28:06.390]  policy correctly, which is kind of depressing if you ask me. And so the thing here is that
[28:06.990 --> 28:13.030]  they would be amenable to a customer configuring things incorrectly, and then essentially the
[28:13.630 --> 28:20.130]  confused deputy problem is still there. 37% of them don't understand what external IDs are,
[28:20.130 --> 28:26.390]  and they explicitly, by design, allow customers to pick their own external IDs on the UI
[28:26.390 --> 28:33.670]  when they're onboarding their integration into the SaaS vendor. And so this defeats
[28:33.670 --> 28:39.970]  the whole purpose, the whole purpose of having the external ID in the first place. It's as if
[28:39.970 --> 28:47.150]  Let's Encrypt said, so configure any TXT record you want to prove you own this domain. That's
[28:47.150 --> 28:54.530]  not how it works. Another 15% unwittingly allowed it to happen this way because of failures,
[28:54.530 --> 28:59.570]  there was a way to bypass the UI, go straight to the API and do this, or some other vulnerability
[28:59.570 --> 29:08.470]  allowed Kasten to do it. So over half of the vendors, 52% of the vendors that used assumed
[29:08.470 --> 29:14.870]  roles had a way for the customer, or the attacker in this case, to just pick whatever external ID
[29:14.870 --> 29:19.970]  they wanted and essentially force the confused deputy problems to happen. So that's a bit
[29:19.970 --> 29:25.570]  depressing. That's amazing research. Check out his talk on Forward Cloud SAC where he delves into
[29:25.570 --> 29:32.630]  this into more detail. And privileges. Whenever you're giving people privileges, you need to be
[29:32.630 --> 29:38.970]  very wary of what those privileges are. You need to make sure you are using this privilege,
[29:38.970 --> 29:47.110]  otherwise you're going to be in a world of pain, right? And so cloud is pretty new, right? And IAM
[29:47.790 --> 29:55.790]  for AWS is really complicated. I'm pretty sure if someone, you know, works the mathematical
[29:55.790 --> 30:01.490]  proofs, it's going to be Turing complete at this point, which makes me particularly mad they didn't
[30:01.490 --> 30:10.310]  do, you know, not resource on SCPs because, you know, anyway. But you should, of course, minimize
[30:10.310 --> 30:19.290]  the privileges that you are assigning to this vendor-customer relationship. And it's an interest
[30:19.290 --> 30:23.390]  of both parties involved, right? As a customer, of course, you want to minimize the amount of
[30:23.390 --> 30:27.770]  privileges anyone has from outside into your environment. But as a vendor, you don't want
[30:27.770 --> 30:34.170]  that liability either, right? If you accidentally ask for more privileges than you need and you can
[30:34.170 --> 30:44.110]  be potentially blamed for illegal access to, I don't know, credit card data, PII, trade secrets,
[30:44.110 --> 30:50.730]  all sorts of things, that's a lot of problems that can cause you as far as legal liability,
[30:50.730 --> 30:58.270]  fines, and all sorts of fun things. So it's in the interest of both parties to keep those
[30:58.270 --> 31:03.890]  privileges to a minimum, right? And so, of course, everyone does that, right?
[31:06.070 --> 31:14.330]  No, not really, no. And so Ben Reeser, and again, I hope I'm saying this right, published this
[31:14.330 --> 31:21.070]  tweet recently, I think last week, and it was very timely for my talk. Thank you very much, Ben.
[31:21.070 --> 31:27.010]  Essentially, he's a user of Okta, which is a publicly traded company.
[31:27.550 --> 31:34.970]  It's a SaaS identity provider, right? And a lot of large corporations use Okta for identity
[31:35.550 --> 31:43.310]  management. And they were asking for very, very overbroad privileges there, like they were asking
[31:43.310 --> 31:50.410]  that they were able to assume any role inside of your account. You can see how that's problematic,
[31:50.410 --> 31:58.710]  right? I mean, an IDP would need to have pretty high privileges, right? Anyway, but this is just
[31:58.710 --> 32:06.090]  too much, right? They can very easily be abused to gain admin level on any AWS account pretty
[32:06.090 --> 32:13.150]  easily, especially again, if you are using AWS organizations, you should write an SSL.
[32:13.150 --> 32:21.710]  And so it's a pretty serious problem. And Ben, I have not been involved in the disclosure process,
[32:21.710 --> 32:27.110]  or I haven't talked directly to Okta about this. What I hear from Ben, and I asked for his
[32:27.110 --> 32:34.590]  permission before sharing this and his opinion on this, is that Okta has been very slow to respond.
[32:34.950 --> 32:40.310]  He's been six months trying to convince them that this is serious. So far, they have updated their
[32:40.310 --> 32:47.030]  documentation. But as far as Ben knows, or from what he told me, they have not notified existing
[32:47.030 --> 32:53.990]  customers to change and reduce those overly permissive roles that have been assigned.
[32:53.990 --> 32:59.230]  So this is the SaaSpocalypse scenario right here. This is a large company with thousands of
[32:59.230 --> 33:06.610]  customers, potentially tens of thousands of AWS accounts connected to them, and an attacker that's
[33:06.610 --> 33:12.150]  able to break into the Okta AWS account that's trusted by all those companies,
[33:12.770 --> 33:19.130]  will very easily be able to find a role, because they have list roles, by the way,
[33:19.130 --> 33:25.110]  and then assume a role with as many privileges as they need, and then take over Okta's customers'
[33:25.110 --> 33:31.390]  accounts. So they did get the external ID right, though, but the privileges, not so much.
[33:31.930 --> 33:37.650]  If you are an Okta customer, look at their updated documentation, update your environment.
[33:37.990 --> 33:44.090]  If you are from Okta, and I'm wrong here, and you are working on this, and they might be just
[33:44.090 --> 33:49.190]  waiting for legal approval or going through some extra hoops internally to be able to notify
[33:49.190 --> 33:54.870]  customers, etc., which I know nothing about, I can only assume they have very talented,
[33:54.870 --> 34:02.950]  competent security and legal people at Okta. So, talk to them, if you are that person,
[34:02.950 --> 34:07.450]  and you have not been involved in this, and get this moving, because there's a lot of potential
[34:07.450 --> 34:15.530]  risk involved. And something that, I mean, Chris Ferris has essentially the same thing,
[34:15.530 --> 34:20.490]  only for natural control, so a vendor that didn't know how to specify less permissive
[34:21.290 --> 34:26.950]  roles essentially asked for, you know, let me manage all of your VPC, what could possibly go
[34:26.950 --> 34:35.810]  wrong? What I have found really, really often, though, is that a lot of vendors ask for the read
[34:35.810 --> 34:42.810]  only AWS managed access policy. And it seems harmless enough, and it's a start, because at
[34:42.810 --> 34:47.950]  least they're not asking to be able to change things in your environment, so yay, I guess.
[34:47.950 --> 34:53.690]  But the problem is that read only access is very misleading. If you actually look at the
[34:53.690 --> 34:59.390]  privileges it's providing, it allows you to do things like read all S3 objects on every bucket
[34:59.390 --> 35:05.240]  on your environment, and list all buckets, perform queries on databases like DynamoDB,
[35:05.870 --> 35:11.370]  and really dangerous things that would give a vendor access to, like, company data.
[35:11.470 --> 35:16.900]  And I've seen this be requested by people, by vendors like, you know, cloud cost optimization,
[35:17.540 --> 35:22.380]  or CSPM providers. They have no business getting access to your data. They only need to see,
[35:22.380 --> 35:27.620]  like, your billing, or even how things are configured, right? But not have access to your
[35:27.620 --> 35:34.960]  actual confidential data, right? So one quick replacement here would be to use the view only
[35:34.960 --> 35:40.800]  access, which is a much better policy in that regard. But, you know, apply list privilege as
[35:40.800 --> 35:49.460]  much as possible, and not necessarily the existing policies are going to be best suited to
[35:49.460 --> 35:57.240]  what you need, right? So in closing, what do I recommend? If you're a customer, you know,
[35:57.240 --> 36:02.220]  you need to be aware of the third party access to your AWS accounts. You need to have governance
[36:02.220 --> 36:07.820]  over it. You need a process to do a regular review, not only on initial onboarding, because those
[36:08.320 --> 36:12.060]  things... privileges only grow, they never diminish. So you need to have regular reviews
[36:12.060 --> 36:17.660]  of what actual privileges are being given to third parties. Ensure there's still business need. We,
[36:17.660 --> 36:23.440]  on our consulting practice, have seen many times the companies that had no longer a contract with
[36:23.440 --> 36:29.920]  our clients, but the access they were given was never revoked. So people need to go account by
[36:29.920 --> 36:33.800]  account and revoke that. So you need to be looking for that, and you can automate that to look for
[36:33.800 --> 36:40.340]  trust policies and so on and so forth. There's an amazing tool written by Kinnard McQuaid called
[36:40.340 --> 36:47.180]  Cloud Splaining that will go through your accounts, and it will look for overprivilege and things that
[36:47.180 --> 36:53.060]  can be abused to exfiltrate data or to do privilege escalation. It's automated. It's awesome. Use it
[36:53.060 --> 36:59.040]  in Parliament by Scott Piper. It can also help you do some QA of policies to make sure required
[36:59.040 --> 37:03.740]  things are not missing, like you're accidentally using something that's not working well. You're
[37:03.740 --> 37:10.260]  an external ID, things like that, and monitor using CloudTrail. If you're a vendor, right,
[37:10.840 --> 37:14.760]  you have the brunt of the work. You have the biggest responsibility here, right? And you
[37:14.760 --> 37:19.440]  have the biggest liability, probably, because you have the sun's liability of all your customers.
[37:20.100 --> 37:24.080]  Minimize as much as you can the privileges asked of customers. And I know this is a balancing act,
[37:24.080 --> 37:27.500]  because if you ask for too few privileges, then when you do need more privileges,
[37:27.500 --> 37:31.180]  you have to go back to customers and ask them to change things, which is bad.
[37:31.180 --> 37:36.380]  But I would err in the side of caution. But then again, that's why I work in security.
[37:37.060 --> 37:43.220]  Make sure you're using external IDs correctly. If you can't easily enforce that the external
[37:43.220 --> 37:50.220]  IDs are unique, use a random UUID. It's such a big key space that statistically you probably
[37:50.220 --> 37:55.940]  won't be in trouble. Ensure customers are correctly implementing the trust policy.
[37:55.940 --> 38:02.360]  I would say standardize the role names as well and use CloudFormation. So the customers don't
[38:02.360 --> 38:07.620]  have to manually set up anything. They just drop CloudFormation stack and everything is done
[38:07.620 --> 38:13.800]  beforehand. There are plenty of examples online if you want to look it up. Or I can provide one
[38:13.800 --> 38:19.540]  later if anyone requests. Limit the attack surface of that account that everyone trusts.
[38:19.540 --> 38:23.900]  You need to run as few things as possible. You need to have as few humans as possible
[38:23.900 --> 38:29.840]  that have access to that one account that every customer is trusting. You need to reduce that
[38:29.840 --> 38:38.440]  attack surface as much as you can. Because if that one account gets broken into, then that's
[38:38.540 --> 38:42.940]  a very bad day for you. The upside is if you're using cross-account access, as soon as you kick
[38:42.940 --> 38:47.680]  the attacker out, they can no longer access your customers. And your customers don't need to do
[38:47.680 --> 38:53.780]  anything themselves. But still, the value for an attacker of breaking into that one trusted
[38:53.780 --> 38:59.800]  account is huge because of all that concentrated access. So you need to be extra careful.
[38:59.920 --> 39:05.380]  Separate it from everything else that you can. And monitor extensively what's happening in that
[39:05.380 --> 39:11.180]  account using CloudTrail. Monitoring is going to be a lot easier if you have less things running on it.
